New kernel methods for phenotype prediction from genotype data.
نویسندگان
چکیده
Phenotype prediction from genotype data is one of the most important issues in computational genetics. In this work, we propose a new kernel (i.e., an SVM: Support Vector Machine) method for phenotype prediction from genotype data. In our method, we first infer multiple suboptimal haplotype candidates from each genotype by using the HMM (Hidden Markov Model), and the kernel matrix is computed based on the predicted haplotype candidates and their emission probabilities from the HMM. We validated the performance of our method through experiments on several datasets: One is an artificially constructed dataset via a program GeneArtisan, others are a real dataset of the NAT2 gene from the international HapMap project, and a real dataset of genotypes of diseased individuals. The experiments show that our method is superior to ordinary naive kernel methods (i.e., not based on haplotype prediction), especially in cases of strong LD (linkage disequilibrium).
منابع مشابه
Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search
In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...
متن کاملPrediction of Phenotype Information from Genotype Data
The dissection of complex diseases is one of the greatest challenges of human genetics with important clinical and scientific applications. Traditionally, associations were sought between single genetic markers and disease. The availability of large scale SNP data makes it possible, for the first time, to study the predictive power of genotypes and haplotypes with respect to phenotype data. Her...
متن کاملSome New Methods for Prediction of Time Series by Wavelets
Extended Abstract. Forecasting is one of the most important purposes of time series analysis. For many years, classical methods were used for this aim. But these methods do not give good performance results for real time series due to non-linearity and non-stationarity of these data sets. On one hand, most of real world time series data display a time-varying second order structure. On th...
متن کاملKernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.
Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a commo...
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome informatics. International Conference on Genome Informatics
دوره 22 شماره
صفحات -
تاریخ انتشار 2010